Processing Joins with User-Defined Functions

نویسندگان

  • Volker Gaede
  • Oliver Günther
چکیده

Most strategies for the computation of relational joins (such as sort-merge or hash-join) are facing major diiculties if the join predicate involves complex, user-deened functions rather than just simple arithmetic comparisons. In this paper, we identify a class of user-deened functions that can be included in a join predicate, such that a join between two sets R and S can still be computed eeciently, i.e., in time signiicantly less than O(jRj jSj). For that purpose, we introduce the notion of the-function, an operator to process each set element separately with respect to the user-deened function(s) being used. Then any particular join query containing those functions can be computed by a variation of some traditional join strategy. After demonstrating this technique on a spatial database example, we present the results of a theoretical analysis and a practical performance evaluation.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

انتخاب مناسب‌ترین زبان پرس‌وجو برای استفاده از فرا‌‌پیوندها جهت استخراج داده‌ها در حالت دیتالوگ در سامانه پایگاه داده استنتاجی DES

Deductive Database systems are designed based on a logical data model. Data (as opposed to Relational Databases Management System (RDBMS) in which data stored in tables) are saved as facts in a Deductive Database system. Datalog Educational System (DES) is a Deductive Database system that Datalog mode is the default mode in this system. It can extract data to use outer joins with three query la...

متن کامل

Generic multiset programming with discrimination-based joins and symbolic Cartesian products

This paper presents GMP, a library for generic, SQL-style programming with multisets. It generalizes the querying core of SQL in a number of ways: Multisets may contain elements of arbitrary first-order data types, including references (pointers), recursive data types and nested multisets; it contains an expressive embedded domain-specific language for specifying user-definable equivalence and ...

متن کامل

Using q-grams in a DBMS for Approximate String Processing

String data is ubiquitous, and its management has taken on particular importance in the past few years. Approximate queries are very important on string data. This is due, for example, to the prevalence of typographical errors in data, and multiple conventions for recording attributes such as name and address. Commercial databases do not support approximate string queries directly, and it is a ...

متن کامل

SPARQling Pig - Processing Linked Data with Pig Latin

In recent years, dataflow languages such as Pig Latin have emerged as flexible and powerful tools for handling complex analysis tasks on big data. These languages support schema flexibility as well as common programming patterns such as iteration. They offer extensibility through user-defined functions while running on top of scalable distributed platforms. In doing so, these languages enable a...

متن کامل

The Multi-Operator Method: Integrating Algorithms for the Efficient and Parallel Evaluation of User-Defined Predicates into ORDBMS

There has been a long record of research for efficient join algorithms in RDBMS, but user-defined join predicates in ORDBMS are typically evaluated using a restriction after forming the complete Cartesian product. While there has been some research on join algorithms for non-traditional data (e.g. spatial joins), today's ORDBMS offer developers no general mechanism that allows to implement user...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1994